Relational database management system having integrated non-relational multi-dimensional data store of aggregated data elements

ABSTRACT

Improved method of and apparatus for joining and aggregating data elements integrated within a relational database management system (RDBMS) using a non-relational multi-dimensional data structure (MDD). The improved RDBMS system of the present invention can be used to realize achieving a significant increase in system performance (e.g. deceased access/search time), user flexibility and ease of use. The improved RDBMS system of the present invention can be used to realize an improved Data Warehouse for supporting on-line analytical processing (OLAP) operations or to realize an improved informational database system or the like.

RELATED CASES

[0001] This is a Continuation-in-part of: copending application Ser. No.09/514,611 entitled “Stand-Alone Cartridge-Style Data Aggregation Serverand Method of and System for Managing Multi-Dimensional Databases usingthe Same” filed Feb. 28, 2000, and copending application Ser. No.09/368,241 entitled “Method Of And System For Managing Multi-DimensionalDatabases Using Modular-Arithmetic Based Address Data Mapping Processes”filed Aug. 4, 1999; said Applications being commonly owned by HyperRollIsrael, Limited, herein incorporated by reference in their entirety.

BACKGROUND OF THE INVENTION

[0002] 1. Field of Invention

[0003] The present invention relates generally to multi-dimensionalrelational databases and, more specifically to mechanisms foraggregating data elements in a multi-dimensional relational databasesystem and for processing queries on such aggregated data elements, andalso to informational database systems that utilize multi-dimensionalrelational databases and such aggregation/query mechanisms.

[0004] 2. Brief Description Of The State Of The Art

[0005] Information technology (IT) enables an enterprise to manage andoptimize its internal business practices through the analysis andsharing of data internally within the enterprise. In addition, ITenables an enterprise to manage and optimize its external businesspractices through the sharing of data with external parties such assuppliers, customers and investors, and through on-line transactionsbetween the enterprise and external parties. Informational databasesystems (systems that store data, support query processing on the storeddata, and possibly support analysis of the stored data) play a centralrole in many different parts of today's IT systems.

[0006]FIG. 1 illustrates exemplary domains where informational databasesystems are used. As shown, an operational environment 10 generates datawhich is stored in a data store 22 in the informational database system20. These domains include data analysis systems (spread-sheet modelingprograms, snap-shots, extraction, denormalization), data warehousing,data marts, OLAP systems, data mining systems, electroniccommerce-enabled web servers, and business-to-business exchanges. Modeminformational database systems typically use a relational databasemanagement system (RDBMS) as a repository for storing the data andquerying the data.

[0007]FIG. 2 illustrates a data warehouse-OLAP domain that utilizes theprior art approaches described above. The data warehouse is anenterprise-wide data store. It is becoming an integral part of manyinformation delivery systems because it provides a single, centrallocation where a reconciled version of data extracted from a widevariety of operational systems is stored. Details on methods of dataintegration and constructing data warehouses can be found in the whitepaper entitled “Data Integration: The Warehouse Foundation” by LouisRollleigh and Joe Thomas, published athttp://www.acxiom.com/whitepapers/wP-11.asp. Building a Data Warehousehas its own special challenges (e.g. using common data model, commonbusiness dictionary, etc.) and is a complex endeavor. However, justhaving a Data Warehouse does not provide organizations with theoften-heralded business benefits of data warehousing. To complete thesupply chain from transactional systems to decision maker, organizationsneed to deliver systems that allow knowledge workers to make strategicand tactical decisions based on the information stored in thesewarehouses. These decision support systems are referred to as On-LineAnalytical Processing (OLAP) systems. Such OLAP systems are commonlyclassified as Relation OLAP systems or Multi-Dimensional OLAP systems.

[0008] The Relational OLAP (ROLAP) system accesses data stored in a DataWarehouse to provide OLAP analyses. The premise of ROLAP is that OLAPcapabilities are best provided directly against the relational database,i.e. the Data Warehouse. The ROLAP architecture was invented to enabledirect access of data from Data Warehouses, and therefore supportoptimization techniques to meet batch window requirements and providefast response times. Typically, these optimization techniques includeapplication-level table partitioning, pre-aggregate inferencing,denormalization support, and the joining of multiple fact tables.

[0009] A typical ROLAP system has a three-tier or layer client/serverarchitecture. The “database layer” utilizes relational databases fordata storage, access, and retrieval processes. The “application logiclayer” is the ROLAP engine which executes the multidimensional reportsfrom multiple users. The ROLAP engine integrates with a variety of“presentation layers,” through which users perform OLAP analyses. Afterthe data model for the data warehouse is defined, data from on-linetransaction-processing (OLTP) systems is loaded into the relationaldatabase management system (RDBMS). If required by the data model,database routines are run to pre-aggregate the data within the RDBMS.Indices are then created to optimize query access times. End userssubmit multidimensional analyses to the ROLAP engine, which thendynamically transforms the requests into SQL execution plans. The SQLexecution plans are submitted to the relational database for processing,the relational query results are cross-tabulated, and a multidimensionalresult data set is returned to the end user. ROLAP is a fully dynamicarchitecture capable of utilizing pre-calculated results when they areavailable, or dynamically generating results from the raw informationwhen necessary.

[0010] The Multidimensional OLAP (MOLAP) systems utilize a MDD or “cube”to provide OLAP analyses. The main premise of this architecture is thatdata must be stored multidimensionally to be accessed and viewedmultidimensionally. Such non-relational MDD data structures typicallycan be queried by users to enable the users to “slice and dice” theaggregated data. As shown in FIG. 2, such MOLAP systems have anAggregation module which is responsible for all data storage, access,and retrieval processes, including data aggregation (i.e.pre-aggregation) in the MDDB, and an analytical processing and GUImodule responsible for interfacing with a user to provide analyticalanalysis, query input, and reporting of query results to the user.

[0011] A more detailed description of the data warehouse and OLAPenvironment may be found in copending U.S. patent application Ser. No.09/514,611 to R. Bakalash, G. Shaked, and J. Caspi, commonly assigned toHyperRoll Israel, Limited, incorporated by reference above in itsentirety.

[0012] In a RDBMS, users view data stored in tables. By contrast, usersof a non-relation database system can view other data structures, eitherinstead of or in addition to the tables of the RDBMS system. FIG. 3Aillustrates an exemplary table in an RDBMS; and FIGS. 3B and 3Cillustrate operators (queries) on the table of FIG. 3A, and the resultof such queries, respectively. The operators illustrated in FIGS. 3B and3C are expressed as Structured Query Language (SQL) statements as isconventional in the art.

[0013] The choice of using a RDBMS as the data repository in informationdatabase systems naturally stems from the realities of SQLstandardization, the wealth of RDBMS-related tools, and readilyavailable expertise in RDBMS systems. However, the querying component ofRDBMS technology suffers from performance and optimization problemsstemming from the very nature of the relational data model. Morespecifically, during query processing, the relational data modelrequires a mechanism that locates the raw data elements that match thequery. Moreover, to support queries that involve aggregation operations,such aggregation operations must be performed over the raw data elementsthat match the query. For large multidimensional databases, a naiveimplementation of these operations involves computational intensivetable scans that leads to unacceptable query response times.

[0014] In order to better understand how the prior art has approachedthis problem, it will be helpful to briefly describe the relationaldatabase model. According to the relational database model, a relationaldatabase is represented by a logical schema and tables that implementthe schema. The logical schema is represented by a set of templates thatdefine one or more dimensions (entities) and attributes associated witha given dimension. The attributes associated with a given dimensionincludes one or more attributes that distinguish it from every otherdimension in the database (a dimension identifier). Relationshipsamongst dimensions are formed by joining attributes. The data structurethat represents the set of templates and relations of the logical schemais typically referred to as a catalog or dictionary. Note that thelogical schema represents the relational organization of the database,but does not hold any fact data per se. This fact data is stored intables that implement the logical schema.

[0015] Star schemas are frequently used to represent the logicalstructure of a relational database. The basic premise of star schemas isthat information can be classified into two groups: facts anddimensions. Facts are the core data elements being analyzed. Forexample, units of individual item sold are facts, while dimensions areattributes about the facts. For example, dimensions are the producttypes purchased and the data purchase. Business questions against thisschema are asked looking up specific facts (UNITS) through a set ofdimensions (MARKETS, PRODUCTS, PERIOD). The central fact table istypically much larger than any of its dimension tables.

[0016] An exemplary star schema is illustrated in FIG. 4A for suppliers(the “Supplier” dimension) and parts (the “Parts” dimension) over timeperiods (the “Time-Period” dimension). It includes a central fact table“Supplied-Parts” that relates to multiple dimensions—the “Supplier”,“Parts” and “Time-Period” dimensions. FIG. 4B illustrates the tablesused to implement the star schema of FIG. 4A. More specifically, thesetables include a central fact table and a dimension table for eachdimension in the logical schema of FIG. 4A. A given dimension tablestores rows (instances) of the dimension defined in the logical schema.For the sake of description, FIG. 4B illustrates the dimension table forthe “Time-Period” dimension only. Similar dimension tables for the“Supplier” and “Part” dimensions (not shown) are also included in suchan implementation. Each row within the central fact table includes amulti-part key associated with a set of facts (in this example, a numberrepresenting a quantity). The multipart key of a given row (valuesstored in the S#, P#, TP# fields as shown) points to rows (instances)stored in the dimension tables described above. A more detaileddescription of star schemas and the tables used to implement starschemas may be found in C. J. Date, “An Introduction to DatabaseSystems,” Seventh Edition, Addison-Wesley, 2000, pp. 711-715, hereinincorporated by reference in its entirety.

[0017] When processing a query, the tables that implement the schema areaccessed to retrieve the facts that match the query. For example, in astar schema implementation as described above, the facts are retrievedfrom the central fact table and/or the dimension tables. Locating thefacts that match a given query involves one or more join operations.Moreover, to support queries that involve aggregation operations, suchaggregation operations must be performed over the facts that match thequery. For large multi-dimensional databases, a naive implementation ofthese operations involves computational intensive table scans thattypically leads to unacceptable query response times. Moreover, sincethe fact tables are pre-summarized and aggregated along businessdimensions, these tables tend to be very large. This point becomes animportant consideration of the performance issues associated with starschemas. A more detailed discussion of the performance issues (andproposed approaches that address such issues) related to joining andaggregation of star schema is now set forth.

[0018] The first performance issue arises from computationally intensivetable scans that are performed by a naive implementation of datajoining. Indexing schemes may be used to bypass these scans whenperforming joining operations. Such schemes include B-tree indexing,inverted list indexing and aggregate indexing. A more detaileddescription of such indexing schemes can be found in “The Art ofIndexing”, Dynamic Information Systems Corporation, October 1999,available at http://www.disc.com/artindex.pdf. All of these indexingschemes replaces table scan operations (involved in locating the dataelements that match a query) with one ore more index lookup operation.Inverted list indexing associates an index with a group of dataelements, and stores (at a location identified by the index) a group ofpointers to the associated data elements. During query processing, inthe event that the query matches the index, the pointers stored in theindex are used to retrieve the corresponding data elements pointedtherefrom. Aggregation indexing integrates an aggregation index with aninverted list index to provide pointers to raw data elements thatrequire aggregation, thereby providing for dynamic summarization of theraw data elements that match the user-submitted query.

[0019] These indexing schemes are intended to improve join operations byreplacing table scan operations with one or more index lookup operationin order to locate the data elements that match a query. However, theseindexing schemes suffer from various performance issues as follows:

[0020] Since the tables in the star schema design typically contain theentire hierarchy of attributes (e.g. in a PERIOD dimension, thishierarchy could be day>week>month>quarter>year), a multipart key of day,week, month, quarter, year has to be created; thus, multiple meta-datadefinitions are required (one of each key component) to define a singlerelationship; this adds to the design complexity, and sluggishness inperformance.

[0021] Addition or deletion of levels in the hierarchy will requirephysical modification of the fact table, which is time consuming processthat limits flexibility.

[0022] Carrying all the segments of the compound dimensional key in thefact table increases the size of the index, thus impacting bothperformance and scalability.

[0023] Another performance issue arises from dimension tables thatcontain multiple hierarchies. In such cases, the dimensional table oftenincludes a level of hierarchy indicator for every record. Everyretrieval from fact table that stores details and aggregates must usethe indicator to obtain the correct result, which impacts performance.The best alternative to using the level indicator is the snowflakeschema. In this schema aggregate tables are created separately from thedetail tables. In addition to the main fact tables, snowflake schemacontains separate fact tables for each level of aggregation. Notably,the snowflake schema is even more complicated than a star schema, andoften requires multiple SQL statements to get the results that arerequired.

[0024] Another performance issue arises from the pairwise join problem.Traditional RDBMS engines are not design for the rich set of complexqueries that are issued against a star schema. The need to retrieverelated information from several tables in a single query—“joinprocessing”—is severely limited. Many RDBMSs can join only two tables ata time. If a complex join involves more than two tables, the RDBMS needsto break the query into a series of pairwise joins. Selecting the orderof these joins has a dramatic performance impact. There are optimizersthat spend a lot of CPU cycles to find the best order in which toexecute those joins. Unfortunately, because the number of combinationsto be evaluated grows exponentially with the number of tables beingjoined, the problem of selecting the best order of pairwise joins rarelycan be solved in a reasonable amount of time.

[0025] Moreover, because the number of combinations is often too large,optimizers limit the selection on the basis of a criterion of directlyrelated tables. In a star schema, the fact table is the only tabledirectly related to most other tables, meaning that the fact table is anatural candidate for the first pairwise join. Unfortunately, the facttable is the very largest table in the query, so this strategy leads toselecting a pairwise join order that generates a very large intermediateresult set, severely affecting query performance.

[0026] This is an optimization strategy, typically referred to asCartesian Joins, that lessens the performance impact of the pairwisejoin problem by allowing joining of unrelated tables. The join to thefact table, which is the largest one, is deferred until the very end,thus reducing the size of intermediate result sets. In a join of twounrelated tables every combination of the two tables' rows is produced,a Cartesian product. Such a Cartesian product improves queryperformance. However, this strategy is viable only if the Cartesianproduct of dimension rows selected is much smaller than the number ofrows in the fact table. The multiplicative nature of the Cartesian joinmakes the optimization helpful only for relatively small databases.

[0027] In addition, systems that exploit hardware and softwareparallelism have been developed that lessens the performance issues setforth above. Parallelism can help reduce the execution time of a singlequery (speed-up), or handle additional work without degrading executiontime (scale-up).). For example, Red Brick™ has developed STARjoin™technology that provides high speed, parallelizable multi-table joins ina single pass, thus allowing more than two tables can be joined in asingle operation. The core technology is an innovative approach toindexing that accelerates multiple joins. Unfortunately, parallelism canonly reduce, not eliminate, the performance degradation issues relatedto the star schema.

[0028] One of the most fundamental principles of the multidimensionaldatabase is the idea of aggregation. The most common aggregation iscalled a roll-up aggregation. This type is relatively easy to compute:e.g. taking daily sales totals and rolling them up into a monthly salestable. The more difficult are analytical calculations, the aggregationof Boolean and comparative operators. However these are also consideredas a subset of aggregation.

[0029] In a star schema, the results of aggregation are summary tables.Typically, summary tables are generated by database administrators whoattempt to anticipate the data aggregations that the users will request,and then pre-build such tables. In such systems, when processing auser-generated query that involves aggregation operations, the pre-builtaggregated data that matches the query is retrieved from the summarytables (if such data exists). FIGS. 5A and 5B illustrate amulti-dimensional relational database using a star schema and summarytables. In this example, the summary tables are generated over the“time” dimension storing aggregated data for “month”, “quarter” and“year” time periods as shown in FIG. 5B. Summary tables are in essenceadditional fact tables, of higher levels. They are attached to the basicfact table creating a snowflake extension of the star schema There arehierarchies among summary tables because users at different levels ofmanagement require different levels of summarization. Choosing the levelof aggregation is accomplished via the “drill-down” feature.

[0030] Summary tables containing pre-aggregated results typicallyprovide for improved query response time with respect to on-the-flyaggregation. However, summary tables suffer from some disadvantages:

[0031] summary tables require that database administrators anticipatethe data aggregation operations that users will require; this is adifficult task in large multi-dimensional databases (for example, indata warehouses and data mining systems), where users always need toquery in new ways looking for new information and patterns.

[0032] summary tables do not provide a mechanism that allows efficientdrill down to view the raw data that makes up the summarytable—typically a table scan of one or more large tables is required.

[0033] querying is delayed until pre-aggregation is completed.

[0034] there is a heavy time overhead because the vast majority of thegenerated information remains unvisited.

[0035] there is a need to synchronize the summary tables before the use.

[0036] the degree of viable parallelism is limited because thesubsequent levels of summary tables must be performed in pipeline, dueto their hierarchies.

[0037] for very large databases, this option is not valid because oftime and storage space.

[0038] Note that it is common to utilize both pre-aggregated results andon-the-fly aggregation in support aggregation. In these system, partialpre-aggregation of the facts results in a small set of summary tables.On-the-fly aggregation is used in the case the required aggregated datadoes not exist in the summary tables.

[0039] Note that in the event that the aggregated data does not exist inthe summary tables, table join operations and aggregation operations areperformed over the raw facts in order to generate such aggregated data.This is typically referred to as on-the-fly aggregation. In suchinstances, aggregation indexing is used to mitigate the performance ofmultiple data joins associated with dynamic aggregation of the raw dataThus, in large multi-dimensional databases, such dynamic aggregation maylead to unacceptable query response times.

[0040] In view of the problems associated with joining and aggregationwithin RDBMS, prior art ROLAP systems have suffered from essentially thesame shortcomings and drawbacks of their underlying RDBMS.

[0041] While prior art MOLAP systems provide for improved access time toaggregated data within their underlying MDD structures, and haveperformance advantages when carrying out joining and aggregationsoperations, prior art MOLAP architectures have suffered from a number ofshortcomings and drawbacks which Applicants have detailed in theircopending U.S. application Ser. Nos. 09/368,241 and 09/514,611incorporated herein by reference.

[0042] In summary, such shortcomings and drawbacks stem from the factthat there is unidirectional data flow from the RDBMS to the MOLAPsystem. More specifically, atomic (raw) data is moved, in a singletransfer, to the MOLAP system for aggregation, analysis and querying.Importantly, the aggregation results are external to the RDBMS. Thus,users of the RDBMS cannot directly view these results. Such results areaccessible only from the MOLAP system. Because the MDD query processinglogic in prior art MOLAP systems is separate from that of the RDBMS,users must procure rights to access to the MOLAP system and beinstructed (and be careful to conform to such instructions) to accessthe MDD (or the RDBMS) under certain conditions. Such requirements canpresent security issues, highly undesirable for system administration.Satisfying such requirements is a costly and logistically cumbersomeprocess. As a result, the widespread applicability of MOLAP systems hasbeen limited.

[0043] Thus, there is a great need in the art for an improved mechanismfor joining and aggregating data elements within a relational databasemanagement system, and for integrating the improved relational databasemanagement system into informational database systems (including thedata warehouse and OLAP domains), while avoiding the shortcomings anddrawbacks of prior art systems and methodologies.

SUMMARY AND OBJECTS OF PRESENT INVENTION

[0044] Accordingly, it is an object of the present invention to providean improved method of and system for joining and aggregating dataelements integrated within a relational database management system(RDBMS) using a non-relational multi-dimensional data structure (MDD),achieving a significant increase in system performance (e.g. deceasedaccess/search time), user flexibility and ease of use.

[0045] Another object of the present invention is to provide such anRDBMS wherein its integrated data aggregation module supportshigh-performance aggregation (i.e. data roll-up) processes to maximizequery performance of large data volumes.

[0046] Another object of the present invention is to provide such anRDBMS system, wherein its integrated data aggregation (i.e. roll-up)module speeds up the aggregation process by orders of magnitude,enabling larger database analysis by lowering the aggregation times.

[0047] Another object of the present invention is to provide such anovel RDBMS system for use in OLAP operations.

[0048] Another object of the present invention is to provide a novelRDBMS system having an integrated aggregation module that carries out annovel rollup (i.e. down-up) and spread down (i.e. top-down) aggregationalgorithms.

[0049] Another object of the present invention is to provide a novelRDBMS system having an integrated aggregation module that carries outfull pre-aggregation and/or “on-the-fly” aggregation processes.

[0050] Another object of the present invention is to provide a novelRDBMS system having an integrated aggregation module which is capable ofsupporting a MDD having a multi-hierarchy dimensionality.

[0051] These and other object of the present invention will becomeapparent hereinafter and in the claims to Invention set forth herein.

BRIEF DESCRIPTION OF THE DRAWINGS

[0052] In order to more fully appreciate the objects of the presentinvention, the following Detailed Description of the IllustrativeEmbodiments should be read in conjunction with the accompanyingDrawings, wherein:

[0053]FIG. 1A is a schematic representation of a prior art informationdatabase system, wherein the present invention may be embodied.

[0054]FIG. 2 is a schematic representation of the prior art datawarehouse and OLAP system, wherein the present invention may beembodied.

[0055] FIGS. 3A-3C are schematic representations of exemplary tablesemployed in a prior art Relational Database Management System (RDBMS);FIGS. 3B and 3C illustrate operators (queries) on the table of FIG. 3A,and the result of such queries, respectively.

[0056]FIG. 4A is a schematic representation of an exemplary dimensionalschema (star schema) of a relational database.

[0057]FIG. 4B is a schematic representation of tables used to implementthe schema shown in FIG. 4A.

[0058]FIG. 5A is a schematic representation of an exemplarymultidimensional schema (star schema).

[0059]FIG. 5B is a schematic representation of tables used to implementthe schema of FIG. 5A, including summary tables storing results ofaggregation operations performed on the facts of the central fact tablealong the time-period dimension, in accordance with conventionalteachings.

[0060]FIG. 6A is a schematic representation of a generalized embodimentof an RDBMS of the present invention comprising a relational databasehaving an integrated multidimensional (MDD) aggregation modulesupporting queries from a plurality of clients, wherein the aggregationengine performs aggregation functions (e.g. summation of numbers, aswell as other mathematical operations, such as multiplication,subtraction, division etc.) and non-relational multi-dimensional datastorage functions.

[0061]FIG. 6B is a schematic block diagram of the MDD aggregation moduleof the illustrative embodiment of the present invention shown in FIG.6A.

[0062] FIGS. 6C(i) and 6C(ii), taken together, set forth a flow chartrepresentation of the primary operations carried out within the RDBMS ofthe present invention when performing data aggregation and relatedsupport operations, including the servicing of user-submitted (e.g.natural language) queries made on such aggregated database of thepresent invention.

[0063]FIG. 6D is a flow chart representation of the primary operationscarried out by the (DB) request serving mechanism within the MDD controlmodule shown in FIG. 6B.

[0064]FIG. 6E is a schematic representation of the view mechanism of theRDBMS that enables users to query on the aggregated data generatedand/or stored in the MDD Aggregation module according to the presentinvention.

[0065]FIG. 6F is a schematic representation of the trigger mechanism ofthe RDBMS that enables users to query on the aggregated data generatedand/or stored in the MDD Aggregation module according to the presentinvention.

[0066]FIG. 7A shows a separate-platform implementation of the RDBMSsystem of the illustrative embodiment shown in FIG. 6A, wherein thequery handling, fact table(s) and dictionary of the RDBMS resides on aseparate hardware platform and/or OS system from that used to run theMDD Aggregation Module of the present invention.

[0067]FIG. 7B shows a common-platform implementation of the RDBMS systemof the illustrative embodiment shown in Fig. A, wherein the queryhandling, fact table(s) and dictionary of the RDBMS shares the samehardware platform and operating system (OS) that used to run the MDDAggregation Module of the present invention.

[0068]FIG. 8A is a data table setting forth information representativeof performance benchmarks obtained by the shared-platform typeimplementation of the MDD Aggregation Module of the illustrativeembodiment serving the conventional OLAP server (i.e. Oracle EXPRESSServer, wherein the common hardware/software platform is realized usinga Pentium II 450 Mhz, 1 GB RAM, 18 GB Disk, running the Microsoft NToperating system (OS);

[0069]FIG. 9A is a schematic representation of the first stage in themethod of segmented aggregation according to the principles of thepresent invention, showing initial aggregation along the 1st dimension.

[0070]FIG. 9B is a schematic representation of the next stage in themethod of segmented aggregation according to the principles of thepresent invention, showing that any segment along dimension 1, such asthe shown slice, can be separately aggregated along the remainingdimensions, 2 and 3, and that in general, for an N dimensional system,the second stage involves aggregation in N−1 dimensions. The principleof segmentation can be applied on the first stage as well, however, onlya large enough data will justify such a sliced procedure in the firstdimension. Actually, it is possible to consider each segment as an N−1cube, enabling recursive computation.

[0071]FIG. 9C1 is a schematic representation of the Query DirectedRoll-up (QDR) aggregation method/procedure of the present invention,showing data aggregation starting from existing basic data or previouslyaggregated data in the first dimension (D1), and such aggregated databeing utilized as a basis for QDR aggregation along the second dimension(D2).

[0072]FIG. 9C2 is a schematic representation of the Query DirectedRoll-up (QDR) aggregation method/procedure of the present invention,showing initial data aggregation starting from existing previouslyaggregated data in the second third (D3), and continuing along the thirddimension (D3), and thereafter continuing aggregation along the seconddimension (D2).

[0073]FIG. 10A is a schematic representation of the “slice-storage”method of storing sparse data in the disk storage devices of the MDDB ofFIG. 6B in accordance with the principles of the present invention,based on an ascending-ordered index along aggregation direction,enabling fast retrieval of data.

[0074]FIG. 10B is a schematic representation of the data organization ofdata files and the directory file used in the storage of the MDDB ofFIG. 6B, and the method of searching for a queried data point thereinusing a simple binary search technique due to the data files ascendingorder;

[0075]FIG. 11A is a schematic representation of three exemplarymulti-hierarchical data structures for storage of data within the MDDBof FIG. 6B, having three levels of hierarchy, wherein the first levelrepresentative of base data is composed of items A, B, F, and G, thesecond level is composed of items C, E, H and I, and the third level iscomposed of a single item D, which is common to all three hierarchicalstructures.

[0076]FIG. 11B is a schematic representation of an optimizedmulti-hierarchical data structure merged from all three hierarchies ofFIG. 11A, in accordance with the principles of the present invention.

[0077]FIG. 12 is a schematic representation showing the levels ofoperations performed by the stand-alone Aggregation Server of FIG. 6B,summarizing the different enabling components for carrying out themethod of segmented aggregation in accordance with the principles of thepresent invention.

[0078]FIG. 13 is a schematic representation of the RDBMS of the presentinvention shown as a component of a central data warehouse, serving thedata storage and aggregation needs of a ROLAP system (or other OLAPsystem).

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS OF THE PRESENTINVENTION

[0079] Referring now to FIG. 6 through FIG. 13, the preferredembodiments of the method and system of the present invention will benow described in great detail herein below.

[0080] Through this document, the term “aggregation” and“pre-aggregation” shall be understood to mean the process of summationof numbers, as well as other mathematical operations, such asmultiplication, subtraction, division etc. It shall be understood thatpre-aggregation operations occur asynchronously with respect to thetraditional query processing operations. Moreover, the term “atomicdata” shall be understood to refer to the lowest level of datagranularity required for effective decision making. In the case of aretail merchandising manager, atomic data may refer to information bystore, by day, and by item. For a banker, atomic data may be informationby account, by transaction, and by branch.

[0081] In general, the improved RDBMS system of the present inventionexcels in performing two distinct functions, namely: the aggregation ofdata; and the handling of the resulting data for “on demand” client use.Moreover, because of improved data aggregation capabilities, the RDBMSof the present invention can be employed in a wide range ofapplications, including Data Warehouses supporting OLAP systems and thelike. For purposes of illustration, initial focus will be accorded tothe RDMS of the present invention.

[0082]FIG. 6A illustrates the primary components of an illustrativeembodiment of the RDBMS of the present invention, namely: supportmechanisms including a query interface and query handler, a relationaldata store including one or more fact tables and a meta-data store forstoring a dictionary (catalogue), and an MDD Aggregation Module thatstores atomic data and aggregated data in a non-relational MDD. Itshould be noted that the RDBMS typically includes additional components(not shown) that are not relevant to the present invention. The queryinterface and query handler service user-submitted queries (in thepreferred embodiment, SQL query statements) forwarded, for example, froma client machine over a network as shown. The query handler, fact tableand meta-data store communicate to the MDD Aggregation Module.Importantly, the query handler and integrated MDD Aggregation Moduleoperates to provide for dramatically improved query response times fordata aggregation operations and drill-downs. Moreover, it is an objectof the present invention is to make user-querying of the non-relationalMDDB no different than querying a relational table of the RDBMS, in amanner that minimizes the delays associated with queries that involveaggregation or drill down operations. This object is enabled byproviding the novel RDBMS system and integrated aggregation mechanism ofthe present invention.

[0083]FIG. 6B shows the primary components of an illustrative embodimentof the MDD Aggregation Module of FIG. 6A, namely: a base data loader forloading the directory and fact table(s) of the RDBMS; an aggregationengine for receiving dimension data and fact data from the base loader,a multi-dimensional database (MDDB); a MDDB handler, an SQL handler thatoperates cooperatively with the query handler of the RDBMS to provideusers with query access to the MDD Aggregation Module, and a controlmodule for managing the operation of the components of the MDDaggregation module. The base data loader may load the directory and facttables over a standard interface (such as OLDB, OLE-DB, ODBC, SQL, API,JDBC, etc.). In this case, the RDBMS and base data loader includecomponents that provide communication of such data over these standardinterfaces. Such interface components are well known in the art. Forexample, such interface components are readily available from AttunityCorporation, http://www.attunity.com.

[0084] During operation, the base data originates from the fact table(s)of the RDBMS. The core data aggregation operations are performed by theAggregation Engine; a Multidimensional Data Handler; and aMultidimensional Data Storage. The results of data aggregation areefficiently stored in a multidimensional data storage (MDDB), by theData Handler. The SQL handler of the MDD Aggregation module servicesuser-submitted queries (in the preferred embodiment, SQL querystatements) forwarded from the query handler of the RDBMS. The SQLhandler of the MDD Aggregation module may communicate with the queryhandler of the RDBMS over a standard interface (such as OLDB, OLE-DB,ODBC, SQL, API, JDBC, etc.). In this case, the support mechanisms of theRDBMS and SQL handler include components that provide communication ofsuch data over these standard interfaces. Such interface components arewell known in the art. Aggregation (or drill down results) are retrievedon demand and returned to the user.

[0085] Typically, a user interacts with a client machine (for example,using a web-enabled browser) to generate a natural language query, thatis communicated to the query interface of the RDBMS, for example over anetwork as shown. The query interface disintegrates the query, viaparsing, into a series of requests (in the preferred embodiment, SQLstatements) that are communicated to the query handler of the RDBMS. Itshould be noted that the functions of the query interface may beimplemented in a module that is not part of the RDBMS (for example, inthe client machine). The query handler of the RDBMS forwards requeststhat involve data stored in the MDD of the MDD Aggregation module to theSQL hander of the MDD Aggregation module for servicing. Each requestspecifies a set of n-dimensions. The SQL handler of the MDD AggregationModule extracts this set of dimensions and operates cooperatively withthe MDD handler to address the MDDB using the set of dimensions,retrieve the addressed data from the MDDB, and return the results to theuser via the query handler of the RDBMS.

[0086] FIGS. 6C(i) and 6C(ii) is a flow chart illustrating theoperations of an illustrative RDBMS of the present invention. In step601, the base data loader of the MDD Aggregation Module loads thedictionary (or catalog) from the meta-data store in the RDBMS. Inperforming this function, the base data loader may utilize an adapter(interface) that maps the data types of the dictionary of the RDBMS (orthat maps a standard data type used to represent the dictionary of theRDBMS) into the data types used in the MDD aggregation module. Inaddition, the base data loader extracts the dimensions from thedictionary and forwards the dimensions to the aggregation engine of theMDD Aggregation Module.

[0087] In step 603, the base data loader loads the fact table(s) fromthe RDBMS. In performing this function, the base data loader may utilizean adapter (interface) that maps the data types of the fact table(s) ofthe RDBMS (or that maps a standard data type used to represent the facttable(s) of the RDBMS) into the data types used in the MDD AggregationModule. In addition, the base data loader extracts the atomic data fromthe fact table, and forwards the atomic data to the aggregation engine.

[0088] In step 605, the aggregation engine rolls-up (aggregates) theatomic data (provided by the base data loader in step 603) along atleast one of the dimensions and operates cooperatively with the MDDhandler to store the resultant aggregated data in the MDD database. Amore detailed description of exemplary aggregation operations accordingto a preferred embodiment of the present invention is set forth belowwith respect to the QDR process of FIGS. 9A-9C.

[0089] In step 607, a reference is defined that provides users with theability to query the data generated by the MDD Aggregation Module and/orstored in the MDDB of the MDD Aggregation Module. This reference ispreferably defined using the Create View SQL statement, which allows theuser to: i) define a table name (TN) associated with the MDD databasestored in the MDD Aggregation Module, and ii) define a link used toroute SQL statements on the table TN to the MDD Aggregation Module. Inthis embodiment, the view mechanism of the RDBMS enables reference andlinking to the data stored in the MDDB of the MDD Aggregation Engine asillustrated in FIG. 6(E). A more detailed description of the viewmechanism and the Create View SQL statement may be found in C. J. Date,“An Introduction to Database Systems,” Addison-Wesley, Seventh Edition,2000, pp. 289-326, herein incorporated by reference in its entirety.Thus, the view mechanism enables the query handler of the RDBMS systemto forward any SQL query on table TN to the MDD aggregation module viathe associated link. In an alternative embodiment, a direct mechanism(e.g., NA trigger mechanism) may be used to enable the RDBMS system toreference and link to the data generated by the MDD Aggregation Moduleand/or stored in the MDDB of the MDD Aggregation Engine as illustratedin FIG. 6F. A more detailed description of trigger mechanisms andmethods may be found in C. J. Date, “An Introduction to DatabaseSystems,” Addison-Wesley, Seventh Edition, 2000, pp. 250, 266, hereinincorporated by reference in its entirety.

[0090] In step 609, a user interacts with a client machine to generate aquery, and the query is communicated to the query interface. The queryinterface generate one or more SQL statements on the reference definedin step 607 (this reference refers to the data stored in the MDDB of theMDD Aggregation Module), and forwards the SQL statement(s) to the queryhandler of the RDBMS.

[0091] In step 611, the query handler receives the SQL statement(s); andoptionally transforms such SQL statement(s) to optimize the SQLstatement (s) for more efficient query handling. Such transformationsare well known in the art. For example, see Kimball, AggregationNavigation With (Almost) No MetaData”, DBMS Data Warehouse Supplement,August 1996, available at http://www.dbmsmag.com/9608d54.html.

[0092] In step 613: the query handler determines whether the receivedSQL statement(s) [or transformed SQL statement(s)] is on the referencegenerated in step 607. If so, operation continues to step 615; otherwisenormal query handling operations continue is step 625

[0093] In step 615, the received SQL statement(s) [or transformed SQLstatement(s)] is routed to the MDD aggregation engine for processing instep 617 using the link for the reference as described above withrespect to step 607.

[0094] In step 617, the SQL statement(s) is received by the SQL handlerof the MDD Aggregation Module, wherein a set of one or moreN-dimensional coordinates are extracted from the SQL statement. Inperforming this function, SQL handler may utilize an adapter (interface)that maps the data types of the SQL statement issued by query handler ofthe RDBMS (or that maps a standard data type used to represent the SQLstatement issued by query handler of the RDBMS) into the data types usedin the MDD aggregation module.

[0095] In step 619, the set of N-dimensional coordinates extracted instep 617 are used by the MDD handler to address the MDDB and retrievethe corresponding data from the MDDB.

[0096] Finally, in step 621, the retrieved data is returned to the uservia the RDBMS (for example, by forwarding the retrieved data to the SQLhandler, which returns the retrieved data to the query handler of theRDBMS system, which returns the results of the user-submitted query tothe user via the client machine), and the operation ends.

[0097] It should be noted that the facts (base data), as it arrives fromRDBMS, may be analyzed and reordered to optimize hierarchy handling,according to the unique method of the present invention, as describedlater with reference to FIGS. 11A and 11B.

[0098] Moreover, the MDD control module of the MDD Aggregation Modulepreferably administers the aggregation process according to the methodillustrated in FIGS. 9A and 9B. Thus, in accordance with the principlesof the present invention, data aggregation within the RDBMS can becarried out either as a complete pre-aggregation process, where the basedata is fully aggregated before commencing querying, or as a querydirected roll-up (QDR) process, where querying is allowed at any stageof aggregation using the “on-the-fly” data aggregation process of thepresent invention. The QDR process will be described hereinafter ingreater detail with reference to FIG. 9C. The response to a request(i.e. a basic component of a client query) requiring “on-the-fly” dataaggregation, or requiring access to pre-aggregated result data via theMDD handler is provided by a query/request serving mechanism of thepresent invention within the MDD control module, the primary operationsof which are illustrated in the flow chart of FIG. 6D. The function ofthe MDD Handler is to handle multidimensional data in the storage(s)module in a very efficient way, according to the novel method of thepresent invention, which will be described in detail hereinafter withreference to FIGS. 10A and 10B.

[0099] The SQL handling mechanism shown in FIG. 6D is controlled by theMDD control module. Requests are queued and served one by one. If therequired data is already pre-calculated, then it is retrieved by the MDDhandler and returned to the client. Otherwise, the required data iscalculated “on-the-fly” by the aggregation engine, and the result movedout to the client, while simultaneously stored by the MDD handler, shownin FIG. 6C.

[0100]FIGS. 7A and 7B outline two different implementations of the RDBMSsystem of the present invention. In both implementations, the queryhandler of the RDBMS system supplies aggregated results retrieved fromthe MDD to a client.

[0101]FIG. 7A shows a separate-platform implementation of the RDBMSsystem of the illustrative embodiment shown in FIG. 6A, wherein thesupport mechanisms (query handling) and relational data store (facttable(s) and dictionary) of the RDBMS resides on a separate hardwareplatform and/or OS system from that used to run the MDD AggregationModule. In this type of implementation, it is even possible to run partsof the RDBMS system and the MDD Aggregation Module on different-typeoperating systems (e.g. NT, Unix, MAC OS).

[0102]FIG. 7B shows a common-platform implementation of the RDBMS systemof the illustrative embodiment shown in Fig. A, wherein the supportmechanisms (query handling) and the relational data store (fact table(s)and dictionary) of the RDBMS shares the same hardware platform andoperating system (OS) that used to run the MDD Aggregation Module.

[0103]FIG. 8A shows a table setting forth the benchmark results of theMDD aggregation module, in accordance with the principles of the presentinvention. The platform and OS is realized using a Pentium II 450 Mhz, 1GB RAM, 18 GB Disk, running the Microsoft NT operating system. The six(6) data sets shown in the table differ in number of dimensions, numberof hierarchies, measure of sparcity and data size. A comparison withORACLE Express, a major OLAP server, is made. It is evident that the MDDaggregation module of the present invention outperforms currentlyleading aggregation technology by more than an order of magnitude.

[0104] Preferably, the MDD aggregation module of the RDBMS of thepresent invention supports a segmented data aggregation method asdescribed in FIGS. 9A through 9C2. These figures outline a simplifiedsetting of three dimensions only; however, the following analysisapplies to any number of dimensions as well.

[0105] The data is being divided into autonomic segments to minimize theamount of simultaneously handled data. The initial aggregation ispracticed on a single dimension only, while later on the aggregationprocess involves all other dimensions.

[0106] At the first stage of the aggregation method, an aggregation isperformed along dimension 1. The first stage can be performed on morethan one dimension. As shown in FIG. 9A, the space of the base data isexpanded by the aggregation process.

[0107] In the next stage shown in FIG. 9B, any segment along dimension1, such as the shown slice, can be separately aggregated along theremaining dimensions, 2 and 3. In general, for an N dimensional system,the second stage involves aggregation in N−1 dimensions.

[0108] The principle of data segmentation can be applied on the firststage as well. However, only a large enough data set will justify such asliced procedure in the first dimension. Actually, it is possible toconsider each segment as an N−1 cube, enabling recursive computation.

[0109] It is imperative to get aggregation results of a specific slicebefore the entire aggregation is completed, or alternatively, to havethe roll-up done in a particular sequence. This novel feature of theaggregation method of the present invention is that it allows thequerying to begin, even before the regular aggregation process isaccomplished, and still having fast response. Moreover, in relationalOLAP and other systems requiring only partial aggregations, the QDRprocess dramatically speeds up the query response.

[0110] The QDR process is made feasible by the slice-oriented roll-upmethod of the present invention. After aggregating the firstdimension(s), the multidimensional space is composed of independentmultidimensional cubes (slices). These cubes can be processed in anyarbitrary sequence.

[0111] Consequently the aggregation process of the present invention canbe monitored by means of files, shared memory sockets, or queues tostatically or dynamically set the roll-up order.

[0112] In order to satisfy a single query, before the requiredaggregation result has been prepared, the QDR process of the presentinvention involves performing a fast on-the-fly aggregation (roll-up)involving only a thin slice of the multidimensional data.

[0113]FIG. 9C1 shows a slice required for building-up a roll-up resultof the 2^(nd) dimension. In case 1, as shown, the aggregation startsfrom an existing data, either basic or previously aggregated in thefirst dimension. This data is utilized as a basis for QDR aggregationalong the second dimension. In case 2, due to lack of previous data, aQDR involves an initial slice aggregation along dimension 3, andthereafter aggregation along the 2^(nd) dimension.

[0114]FIG. 9C2 shows two corresponding QDR cases for gaining results inthe 3d dimension. Cases 1 and 2 differ in the amount of initialaggregation required in 2^(nd) dimension.

[0115]FIG. 10A illustrates the “Slice-Storage” method of storing sparsedata on storage disks. In general, this data storage method is based onthe principle that an ascending-ordered index along aggregationdirection, enables fast retrieval of data. FIG. 10A illustrates aunit-wide slice of the multidimensional cube of data. Since the data issparse, only few non-NA data points exist. These points are indexed asfollows. The Data File consists of data records, in which each n−1dimensional slice is being stored, in a separate record. These recordshave a varying length, according to the amount of non-NA stored points.For each registered point in the record, IND_(k) stands for an index ina n-dimensional cube, and Data stands for the value of a given point inthe cube.

[0116]FIG. 10B illustrates a novel method for randomly searching for aqueried data point in the MDD of the RDBMS of the present invention byusing a novel technique of organizing data files and the directory fileused in the storage of the MDD, so that a simple binary search techniquecan then be employed within the aggregation module of the RDMB.According to this method, a metafile termed DIR File, keeps pointers toData Files as well as additional parameters such as the start and endaddresses of data record (IND₀, IND_(n)), its location within the DataFile, record size (n), file's physical address on disk (D_Path), andauxiliary information on the record (Flags).

[0117] A search for a queried data point is then performed by an accessto the DIR file. The search along the file can be made using a simplebinary search due to file's ascending order. When the record is found,it is then loaded into main memory to search for the required point,characterized by its index IND_(k). The attached Data field representsthe queried value. In case the exact index is not found, it means thatthe point is a NA.

[0118]FIGS. 11A and 11B illustrate a novel method performed by the MDDaggregation module of the RDBMS of the present invention forpre-processing data such that multi-hierarchies in multi-hierarchicalstructures are optimally merged. According to the devised method, theinner order of hierarchies within a dimension is optimized, to achieveefficient data handling for summations and other mathematical formulas(termed in general “Aggregation”). The order of hierarchy is definedexternally. It is brought from a data source to the stand-aloneaggregation engine, as a descriptor of data, before the data itself. Inthe illustrative embodiment, the method assumes hierarchical relationsof the data, as shown in FIG. 11A. The way data items are ordered in thememory space of the Aggregation Server, with regard to the hierarchy,has a significant impact on its data handling efficiency.

[0119] Notably, when using prior art techniques, multiple handling ofdata elements, which occurs when a data element is accessed more thanonce during aggregation process, has been hitherto unavoidable when themain concern is to effectively handle the sparse data. The datastructures used in prior art data handling methods have been designedfor fast access to a available data (not NA data). According to priorart techniques, each access is associated with a timely search andretrieval in the data structure. For the massive amount of datatypically accessed from a Data Warehouse in an OLAP application, suchmultiple handling of data elements has significantly degraded theefficiency of prior art data aggregation processes. When using prior artdata handling techniques, the data element D shown in FIG. 11A must beaccessed three times, causing poor aggregation performance.

[0120] In accordance with the present invention, the MDD aggregationmodule of the RDBMS performs the loading of base data and theaggregation and storage of the aggregated data in a way that limits theaccess of to a singular occurrence, as opposed to multiple occurrencesas taught by prior art methods. According to the present invention,elements of base data and their aggregated results are contiguouslystored in a way that each element will be accessed only once. Thisparticular order allows a forward-only handling, never backward. Once abase data element is stored, or aggregated result is generated andstored, it is never to be retrieved again for further aggregation. As aresult the storage access is minimized. This way of singular handlinggreatly elevates the aggregation efficiency of large data bases. Thedata element D, as any other element, is accessed and handled only once.

[0121]FIG. 11A shows an example of a multi-hierarchical databasestructure having 3 hierarchies. As shown, the base data includes theitems A, B, F, and G., The second level is composed of items C, E, H andI. The third level has a single item D, which is common to all threehierarchical structures. In accordance with the method of the presentinvention, a minimal computing path is always taken. For example,according to the method of the present invention, item D will becalculated as part of structure 1, requiring two mathematical operationsonly, rather than as in structure 3, which would need four mathematicaloperations. FIG. 11B depicts an optimized structure merged from allthree hierarchies.

[0122]FIG. 12 summarizes the different enabling components for segmentedaggregation. The minimized operations in handling multi-hierarchies needanalysis of the base data. It greatly optimizes data handling andcontribute to aggregation speed. Based on this technology loading andindexing operations become very efficient, minimizing memory and storageaccess, and speeding up storing and retrieval operations. On top of allthe enabling technologies is the segmented aggregation technique, notjust outperforming by orders of magnitude the prior-art aggregationalgorithms, but also enabling the unique QDR which waves out the need ofwaiting for full pre-aggregation.

[0123]FIG. 13 shows the improved RDBMS of the present invention as acomponent of a data warehouse, serving the data storage and aggregationneeds of a ROLAP system (or other OLAP systems alike). Importantly, theimproved RDBMS of the present invention provides flexible,high-performance access and analysis of large volumes of complex andinterrelated data. Moreover, the improved Data Warehouse RDBMS of thepresent invention can simultaneously serve many different kinds ofclients (e.g. data mart, OLAP, URL) and has the power of delivering anenterprise-wide data storage and aggregation in a cost-effective way.This kind of system eliminates redundancy over the group of clients,delivering scalability and flexibility. Moreover, the improved RDBMS ofthe present invention can be used as the data store component of in anyinformational database system as described above, including dataanalysis programs such as spread-sheet modeling programs, serving thedata storage and aggregation needs of such systems.

[0124] Functional Advantages Gained by the Improved RDBMS Of The PresentInvention

[0125] The features of the RDBMS of the present invention, provides fordramatically improved response time in handling queries issued to theRDBMS that involve aggregation, thus enabling enterprise-widecentralized aggregation. Moreover, in the preferred embodiment of thepresent invention, users can query the aggregated data in an manner nodifferent than traditional queries on an RDBMS.

[0126] The method of Segmented Aggregation employed by the novel RDBMSof the present invention provides flexibility, scalability, thecapability of Query Directed Aggregation, and speed improvement.

[0127] Moreover, the method of Query Directed Aggregation (QDR) employedby the novel RDBMS of the present invention minimizes the data handlingoperations in multi-hierarchy data structures, eliminates the need towait for full aggregation to be complete, and provides for buildup ofaggregated data required for full aggregation.

[0128] It is understood that the System and Method of the illustrativeembodiments described herein above may be modified in a variety of wayswhich will become readily apparent to those skilled in the art of havingthe benefit of the novel teachings disclosed herein. All suchmodifications and variations of the illustrative embodiments thereofshall be deemed to be within the scope and spirit of the presentinvention as defined by the claims to Invention appended hereto.

What is claimed is:
 1. A relational database management system (RDBMS)comprising: a relational data store storing fact data; an aggregationmodule, operatively coupled to the relational data store, foraggregating the fact data and storing the resultant aggregated data in anon-relational multi-dimensional data store; a query servicingmechanism, operatively coupled to the aggregation module, for servicingquery statements generated in response to user input, said queryservicing mechanism comprising: a reference generating mechanism forgenerating a user-defined reference to aggregated fact data generated bythe aggregation module; and a query processing mechanism for processinga given query statement, wherein, upon identifying that the given querystatement is on said user-defined reference, communicates with saidaggregation module to retrieve portions of aggregated fact data pointedto by said reference that are relevant to said given query statement. 2.The RDBMS of claim 1, wherein said aggregation module includes a queryhandling mechanism for receiving query statements, and whereincommunication between said query processing mechanism and said queryhandling mechanism is accomplished by forwarding the given querystatement to the query handling mechanism of the aggregation module. 3.The RDBMS of claim 2, wherein said query handling mechanism extractsdimensions from the received query statement and forwards the dimensionsto the storage handler, and wherein the storage handler accesseslocations of the non-relational multi-dimensional data store based uponthe forwarded dimensions and returns the retrieved data back to thequery servicing mechanism for communication to the user.
 4. The RDBMS ofclaim 1, wherein said aggregation module includes a data loadingmechanism for loading at least fact data from the relational data store,an aggregation engine for aggregating the fact data and an storagehandler for storing the fact data and resultant aggregated fact data inthe non-relational multi-dimensional data store.
 5. The RDBMS of claim4, wherein said aggregation module includes control logic that, upondetermining that the non-relational multi-dimensional data store doesnot contain data required to service the given query statement, controlsthe data loading mechanism and aggregation engine to aggregate at leastfact data required to service the given query statement and controls theaggregation module to return the aggregated data back to the queryservicing mechanism for communication to the user.
 6. The RDBMS of claim1, in combination with a data analysis engine to realize an OLAP system.7. The RDBMS of claim 6, wherein said OLAP system is a ROLAP system. 8.The RDBMS of claim 1, for use as an enterprise wide data warehouse thatinterfaces to a plurality of information technology systems.
 9. TheRDBMS of claim 1, for use as a database store in an informationaldatabase system.
 10. The RDBMS of claim 9, wherein said informationaldatabase system is a spreadsheet modeling program.
 11. The RDBMS ofclaim 1, wherein said query statements are generated by a queryinterface in response to communication of a natural language querycommunicated from a client machine.
 12. The RDBMS of claim 11, whereinsaid client machine comprises a web-enabled browser to communicate saidnatural language query to the query interface.
 13. A relational databasemanagement system (RDBMS) comprising: a relational data store storingfact data; an integrated aggregation module, operatively coupled to therelational data store, for aggregating the fact data and storing theresultant aggregated data in a non-relational multidimensional datastore.
 14. The RDBMS of claim 13, wherein user operations in querying ofthe non-relational multi-dimensional data store is no different thanquerying data in the relational store.
 15. The RDBMS of claim 13,wherein said aggregation module includes a data loading mechanism forloading at least fact data from the relational data store, anaggregation engine for aggregating the fact data and an storage handlerfor storing the fact data and resultant aggregated fact data in thenon-relational multi-dimensional data store.
 16. The RDBMS of claim 15,wherein said aggregation module includes control logic that, upondetermining that the non-relational multi-dimensional data store doesnot contain data required to service a given query statement, controlsthe data loading mechanism and aggregation engine to aggregate at leastfact data required to service the given query statement and controls theaggregation module to return the aggregated data for communication tothe user.
 17. The RDBMS of claim 13, in combination with a data analysisengine to realize an OLAP system.
 18. The RDBMS of claim 17, whereinsaid OLAP system is a ROLAP system.
 19. The RDBMS of claim 13, for useas an enterprise wide data warehouse that interfaces to a plurality ofinformation technology systems.
 20. The RDBMS of claim 13, for use as adatabase store in an informational database system.
 21. The RDBMS ofclaim 20, wherein said informational database system is a spreadsheetmodeling program.
 22. The RDBMS of claim 13, wherein said querystatements are generated by a query interface in response tocommunication of a natural language query communicated from a clientmachine.
 23. The RDBMS of claim 22, wherein said client comprises aweb-enabled browser to communicate said natural language query to thequery interface.
 24. In a relational database management system (RDBMS)comprising a relational data store storing fact data, a method foraggregating the fact data and providing query access to the aggregateddata comprising the steps of: providing an integrated aggregationmodule, operatively coupled to the relational data store, foraggregating the fact data and storing the resultant aggregated data in anon-relational multi-dimensional data store; in response to user input,generating a reference to aggregated fact data generated by theaggregation module; and processing a given query statement generated inresponse to user input, wherein, upon identifying that the given querystatement is on said reference, retrieving from the integratedaggregation module portions of aggregated fact data pointed to by saidreference that are relevant to said given query statement.
 25. Themethod of claim 24, further comprising the step of extracting dimensionsfrom the given query statement, accessing locations of thenon-relational multi-dimensional data store based upon the extracteddimensions, and returning the retrieved data back to the user.
 26. Themethod of claim 24, wherein said aggregation module includes a dataloading mechanism for loading at least fact data from the relationaldata store, an aggregation engine for aggregating the fact data and anstorage handler for storing the fact data and resultant aggregated factdata in the non-relational multi-dimensional data store.
 27. The methodof claim 26, wherein said aggregation module, upon determining that thenon-relational multi-dimensional data store does not contain datarequired to service the given query statement, controls the data loadingmechanism and aggregation engine to aggregate at least fact datarequired to service the given query statement and controls theaggregation module to return the aggregated data back to the user. 28.The method of claim 24, further comprising the step of performing dataanalysis operations on the retrieved data as part of an OLAP system. 29.The method of claim 28, RDBMS of claim 6, wherein said OLAP system is aROLAP system.
 30. The method of claim 24, wherein said RDBMS is used asan enterprise wide data warehouse that interfaces to a plurality ofinformation technology systems.
 31. The method of claim 24, wherein saidRDBMS is uses as a database store in an informational database system.32. The method of claim 31, wherein said informational database systemis a spreadsheet modeling program.
 33. The method of claim 24, whereinsaid query statements are generated by a query interface in response tocommunication of a natural language query communicated from a clientmachine.
 34. The method of claim 33, wherein said client machinecomprises a web-enabled browser to communicate said natural languagequery to the query interface.